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ABSTRACT 

In  this  paper,  the  author  gives  a  review  of  the  literature  on  various  techniques  for 
determination  of  the  ranks  of  regression  matrix  and  canonical  correlation  matrix.  Also, 
methods  of  selection  of  important  original  variables  under  multivariate  regression  and 
canonical  correlation  models  are  reviewed.  The  methods  reviewed  involve  not  only  tests  of 
hypotheses  but  also  model  selection  methods  based  upon  information  theoretic  criteria. 

Key  words  and  phrases:  Contingency  tables,  correlated  multivariate  regression  equations 
model,  discriminant  analysis,  econometrics,  likelihood  ratio  test,  linear  and  structural  relations, 
pattern  recognition,  random  effects  mode.,  rank  of  canonical  correlation  matrix,  rank  of 
regression  matrix,  selection  of  variables,  and  structure  of  interaction. 
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1 .  INTRODUCTION 


The  techniques  of  multivariate  regression  analysis  and  canonical  correlation  analysis 
play  a  very  important  role  in  the  analysis  of  multivariate  data  in  many  disciplines  The 

object  of  this  paper  is  to  give  a  review  of  some  of  the  work  done  in  the  literature  on 

reduction  of  dimensionality  in  the  above  areas.  The  main  emphasis  of  this  review  is  on 

techniques  for  determination  of  the  ranks  of  the  regression  matrix  and  canonical  correlation 
matrix.  We  also  review  methods  for  selection  of  important  original  variables  in  the  areas 
of  multivariate  regression  analysis  and  canonical  correlation  analysis.  This  review  is  by  no 
means  exhaustive. 

The  sample  regression  matrix  is  widely  used  to  estimate  the  population  regression 
matrix  under  classical  multivariate  regression  model.  But,  the  above  estimate  is  not  the 
maximum  likelihood  estimate  even  when  the  underlying  distribution  is  multivariate  normal  if 
the  population  regression  matrix  is  not  of  full  rank.  So,  it  is  useful  to  make  a  preliminary 
test  to  determine  the  rank  of  the  regression  matrix  and  use  this  information  in 

determination  of  the  final  estimate  of  the  regression  matrix.  The  problem  of  determination 
of  the  rank  of  the  regression  matrix  is  also  useful  to  determine  the  number  of  linear 
relations  between  the  elements  of  the  regression  matrix.  Also,  the  problem  of 
determination  of  the  number  of  important  discriminant  functions  is  a  special  case  of  the 
problem  of  determination  of  the  rank  of  the  regression  matrix.  The  problem  of 
determination  of  the  rank  of  the  canonical  correlation  matrix  is  useful  in  studying  the 
relationship  between  two  sets  of  variables.  The  number  of  significant  canonical  correlations 
is  equivalent  to  the  number  of  pairs  of  canonical  variables  which  are  adequate  for  studying 
the  relationship  between  the  two  sets  of  variables.  When  the  underlying  distribution  is 
multivariate  normal,  the  rank  of  the  canonical  correlation  matrix  is  equal  to  the  rank  of  the 
regression  matrix  under  a  conditional  model. 

We  will  now  mention  very  briefly  about  the  importance  of  selection  of  original 
variables  In  the  area  of  multivariate  regression  analysis,  it  is  of  interest  to  select  a  small 
number  of  original  variables  which  are  adequate  for  prediction.  Similarly,  in  canonical 
correlation  analysis,  it  is  of  interest  to  select  important  original  variables  which  are  adequate 
to  explain  the  relationship  between  two  sets  of  variables.  A  brief  outline  of  the  contents 
of  the  paper  is  given  below 


in  Section  2,  we  give  some  preliminaries  which  are  needed  in  the  sequel.  In  Section 
3.  we  first  discuss  the  problem  of  determination  of  the  number  of  important  discriminant 
functions  starting  with  the  work  of  Fisher  (1939).  Then,  we  discuss  the  test  procedures 
for  the  rank  of  the  regression  matrix  under  classical  multivariate  regression  model  In 

particular,  we  review  the  work  of  Anderson  (1951),  Fujikoshi  (1974),  Krishnaiah.  Lin  and 

Wang  (1985),  Rao  (1973)  and  Tintner  (1945).  In  Section  4,  we  review  the  recent  work  of 
Bai,  Krishnaiah  and  Zhao  (1986a)  for  estimation  of  the  rank  of  the  regression  matrix  using 
model  selection  methods.  These  estimates  are  strongly  consistent.  In  these  methods, 

information  theoretic  criteria  are  used  to  select  one  of  the  various  models  where  each 
model  is  associated  with  a  particular  rank.  In  Section  5,  we  discuss  the  problem  of 
determination  of  the  rank  of  the  interaction  matrix  in  two-way  classification  with  one 

observation  per  cell.  The  problem  of  determination  of  the  rank  of  the  covariance  matrix 

of  the  random  effects  in  one-way  multivariate  random  effects  model  is  discussed  in 

Section  6  when  sample  sizes  of  various  groups  are  equal.  The  modified  likelihood  ratio 
test  (LRT)  procedure  derived  by  Rao  (1983)  and  the  LRT  procedure  derived  by  Anderson 
(1984)  and  Schott  and  Saw  (1984)  for  the  above  problem  are  reviewed.  The  model 
selection  methods  proposed  by  Zhao,  Krishnaiah  and  Bai  (1985a,b)  recently  for  the  above 
problem  are  also  reviewed;  these  methods  give  strongly  consistent  estimates  of  the  rank 
of  the  covariance  matrix  of  the  random  effects  for  the  cases  when  the  error  covariance 
matrix  is  known  or  unknown.  A  brief  review  of  some  of  the  methods  of  selection  of  the 

original  variables  under  multivariate  regression  model  is  given  in  Section  7.  In  particular,  we 

2 

discuss  Roy's  largest  root  test,  T  test,  tests  for  additional  information  (Rao  (1948)),  and 

max 

finite  intersection  tests  (Krishnaiah  (1965)).  A  critical  review  of  the  widely  used  stepwise 
techniques  for  selection  of  original  variables  in  discriminant  analysis  is  given  in  Section  8 
We  give  the  reasons  why  we  should  not  use  the  above  stepwise  methods 

In  Section  9,  we  review  the  work  of  Bartlett  (1948),  Hsu  (1948a,b),  Fujikoshi  (1974). 
Lawley  (1956),  Krishnaiah,  Lin  and  Wang  (1985)  and  others  on  tests  for  the  rank  of  the 
canonical  correlation  matrix.  The  recent  work  of  Bai,  Krishnaiah  and  Zhao  (1986a)  for  the 
above  problem  using  model  selection  methods  is  reviewed  in  Section  10,  these  methods 
yield  strongly  consistent  estimates  of  the  rank  of  the  canonical  correlation  matrix.  In 
Section  11.  we  review  some  methods  of  selection  of  original  variables  in  canonical 
correlation  analysis.  Finally,  in  Section  1 2,  we  discuss  the  problems  of  reduction  of 


*  -  x  w.  * 


dimensionality  in  connection  with  studying  the  structure  of  dependence  in  two-way 
contingency  tables.  The  work  of  Lancaster  (1969),  O'Neil  (U978a),(1978b),(1980)),  Bhaskar 
Rao,  Krishnaiah  and  Subramanyam  (1985!  is  reviewed  in  the  above  section.  The  recent 
work  of  Bai,  Krishnaiah  and  Zhao  (1986b)  using  model  selection  approach  is  also  reviewed. 


2.  PRELIMINARIES 

The  following  notation  is  used  throughout  this  paper.  The  transpose  of  a  matrix  is 

denoted  by  A’  whereas  the  inverse  of  a  square  matrix  is  denoted  by  B  \  The  transpose 

* 

of  conjugate  of  a  complex  matrix  C  is  denoted  by  C  . 

We  now  define  elliptically  symmetric  distribution,  complex  multivariate  normal  and 
complex  elliptically  symmetric  distribution.  A  random  vector  x  :  px  1  is  said  to  have 
elliptically  symmetric  distribution  if  its  density  is  of  the  form 


-1/2  -  1 

f(x)  =  |Ij  h((x— li’Z  (x-y)). 


(2.1) 


For  some  details  on  the  elliptically  symmetric  distribution,  the  reader  is  referred  to  Kelker 
(1970).  Multivariate  normal,  multivariate  t  and  multivariate  Cauchy  distribution  are  special 
cases  of  the  elliptically  symmetric  distribution.  A  pxl  random  vector  z  =  xi  +  ix^  is  said 
to  be  distributed  as  complex  multivariate  normal  if  x'  =  (x'^x^)  is  distributed  as  multivariate 
normal  with  mean  vector  and  covariance  matrix  where 


J°  -z 


and  E  is  of  order  pxp  The  complex  multivariate  normal  distribution  was  considered  by 
Wooding  (1959),  Goodman  (1963)  and  others.  The  density  function  of  the  complex 
multivariate  normal  distribution  is  of  the  form 


f(z)  =  it  P|E)  1  exp [  ^  'z-y^)*E  ^z-y^)] 


-1*  ■.*  --  r 


(2.2) 


where  2,  =  2(2  ^0  =  ^li+''’i2  ^or  3  rev,ew  the  ,iterature  on  some  multivariate 

distributions,  the  reader  is  referred  to  Krishnaiah  (1976).  We  now  define  complex 
elliptically  symmetric  distribution  introduced  by  Krishnaiah  and  Lin  (1986).  A  pxl  random 
vector  z  =  X1  +  'x2  is  said  to  be  distributed  as  complex  elliptically  symmetric  distribution  if 
x’  =  (x'^x1  )  is  distributed  as  elliptically  symmetric  distribution  with  density 


-1/2  -  1 

f(x)  =  |Z  |  h((x-v  )'Z  Ix-jJ) 


(2.3) 


where  y  =  (jl  ,)jL), 


/  Z  Z 

Z0  =  (  1  2 

\-Z„  z 

\  2  1 


and  I  is  of  order  pxp  The  density  of  z  is  of  the  form 


g(z)  =  |zf  h0«z-^)H2~  <2~1 J^)l 


(2.4) 


where  I  =  2(2 1  -  i 2 2>  and  y^  =  y^  +  'V^  Complex  multivariate  normal  and  complex 
multivariate  t  distributions  are  special  cases  of  complex  elliptically  symmetric  distribution 


3.  TESTS  FOR  THE  RANK  OF  THE  REGRESSION  MATRIX 

In  this  section  we  first  discuss  procedures  for  testing  the  hypothesis  on  the  number 
of  significant  discriminant  functions  since  this  is  a  special  case  of  the  problem  of  testing 
for  the  rank  of  regression  matrix. 


Let  x  . x  be  distributed  independently  as  multivariate  normal  with  mean  vectors 

-  1 

y y  and  a  common  covariance  matrix  £.  Also,  let  x  (j=  1 ,2 n )  denote  j-th  independent 

~  1  ~k  —  1  j  1 

observation  on  x  Then,  the  between  group  sums  of  squares  and  cross  products  (SP) 
matrix  is  given  by 


S  =  y  n  (x  -x.)(x  -x.) 

b  i  -i.  -  ~i,  ~ 

i=  1 


(3.1) 


whereas  the  within  group  SP  matrix  is  given  by 


k.  ni 


S  =  T  y  (x  -X  )(x  -X  ) 

w  “  “  -ij  -I.  -IJ  -I. 

1=1  J=1 


(3.2) 


where 


y  x  ,  nx 
i-i*''  * 


i=i  j=i 


and  n=n  +...+n  Now,  let  £  >  ...  >  £  denote  the  eigenvalues  of  S  S  .  Also,  let 

Ik  1  p  b  w 


ft  -  z  n  (ul— V*' 


~l  ~  —  I  - 


(3.3) 


where  py  -  (n^+^+n^).  The  rank  of  ft  is  equivalent  to  the  number  of  significant 
discriminant  functions.  Fisher  (1939,)  proposed  to  use  T  =  (£  +...+£  )  as  a  test  statistic 

1  r+ 1  s 

for  testing  the  hypothesis  that  the  rank  of  ft  is  equal  to  r  where  s  =  min(p.k-l).  In 

general,  we  can  use  suitable  functions  ib(£  . £  )  of  £  . £  to  test  for  the  rank  of 

r+  1  s  r+  1  $ 

-  1 

ftZ  For  example,  i|){£  . £  )  may  be  £  But  the  distributions  of  these  statistics 

r+1  s  r+1 

involve  X  . X  as  nuisance  parameters  where  X  >  ...  >  X  are  the  eigenvalues  of 

i  r  Ip 

ftZ~  1  /n. 

We  now  discuss  the  asymptotic  joint  distribution  of  the  eigenvalues  of  S  S  1  derived 

b  w 

by  Bai,  Krishnaiah  and  Liang  (1984)  since  it  is  useful  in  implementation  of  some  test 
procedures  for  determination  of  the  rank  of  ft  under  certain  conditions  For  each  i,  let 
x  , . x  be  distributed  independently  and  identically  as  elliptically  symmetric  distribution  with 

i  i  —m 


density 


f (x )  =  |  Z  |  h((x-y)T  (x-y)). 


(J.4J 


I  ~i  -I  -i 


Also  let  $,  >...>£  denote  the  eigenvalues  of  S  S  In  addition,  let  9  >  ...  >  0 

Ip  b  w  Ip 

denote  the  eigenvalues  of  £21  1  whose  multiplicities  are  given  below: 


...  =  0  =  n6 

p*  i 


0  =  ...  =  0  =  n6 

p*  +  1  p*  2 

1  2 


0  =  ...  =  0  =  nfi 

r«  +  1  p*  t 

t-  1  t 


e  *  =  ...  =  0  =o 

P  + 1  P 
t 


where  p*  =  Pi  +  ...  +  p  (j=  1 .2 . t+1),  r  =  p*  p  =  p^  +  ...  +  p^+ ^  and  p*  =  0.  In  addition. 


u  =  /n(262  +  4<5  )  1/2(£  -fi  ) 

i  h  h  i  h 

h  h 


u  =  n  £ 

r+j  r  +  j 


where  h  =  1,2 . t,  i  =  p*  +1 . p*  and  j  =  1.2,...,s-r  where  r  denotes  the  number  of  non- 

h  h-1  h 

zero  eigenvalues  of  £21  1  Now,  let  n  =  nq  for  i  =  1,2 . k.  Then,  Bai,  Krishnaiah  and  Liang 

I  I 

(1984)  derived  the  following  expression  for  the  limiting  distribution  of  u_ . u  as  n  +  ® 


. u 

■  i  p 
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Here  n  (.).  (j=1,2 . t+1).  denotes  the  joint  distribution  of  the  eigenvalues  of  the  random 

j 

matrix  A  For  j=1,2 . t,  the  elements  of  A  are  distributed  independently  as  normal  with 

1  i 

zero  means,  and  variances  of  the  diagonal  elements  are  equal  to  1  whereas  the  variances 
of  the  off-  diagonal  elements  are  equal  to  1/2.  In  other  words,  the  random  matrices 
A  ,...,A  are  known  to  be  distributed  as  central  Gaussian  matrices.  Also,  A  :  (s-r)x(s-r)  is 

it  t+i 

distributed  as  central  Wishart  matrix  with  (k—  1  — r)  degrees  of  freedom  Computational 
aspects  of  the  percentage  points  of  the  individual  eigenvalues  of  the  central  Gaussian 
matrix  and  central  Wishart  matrix  are  discussed  in  Krishnaiah  (1980).  When  the  underlying 
distribution  is  multivariate  normal,  the  expression  (3.7)  was  derived  by  Hsu  (1941).  W.Q 
Liang  (personal  communication)  found  an  error  in  the  proof  of  Hsu.  However,  Bai  (1984) 
pointed  out  that  the  final  result  of  Hsu  is  correct  Bai,  Krishnaiah  and  Liang  (1984)  showed 
that  the  above  result  is  true  even  when  the  observations  are  distributed  independently  as 

elliptically  symmetric  From  the  result  of  Bai,  Krishnaiah  and  Liang  (1984)  it  is  obvious  that 

2 

when  r  is  the  rank  of  ft,  n(£  +...+£  )  is  asymptotically  distributed  as  X  with  (s— r)(k  —  1  — r) 

degrees  of  freedom  even  when  the  underlying  distribution  of  the  observation  is  elliptically 
symmetric 

Bai,  Krishnaiah  and  Liang  (1984)  proposed  the  following  sequential  procedure  for  the 

rank  of  ft  when  n  n  tend  to  infinity  such  that  (n  /n) (n  /n)  tend  to  (say)  q  q 

Ik  Ik  I  k 

respectively  The  hypothesis  ft  =  0  is  accepted  or  rejected  according  as 


where 

PU,  <  ca  |ft  =  0]  =  ( 1  —a i )  (3.9) 

i 


If  ft  =  0,  we  don't  proceed  further  If  ft  =  0  is  rejected,  we  accept  or  reject  H 
according  as 


where 


(3.11) 
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PU2  <- 


CG  1 


IH 


>  c  ]  =  (1  - 

<31 


a2> 


and  H  denotes  the  hypothesis  that  the  rank  of  0,  is  t.  When  is  true,  the  distribution  of 
is  independent  of  If  H ^  is  accepted,  we  don't  proceed  further.  Otherwise,  we 

accept  or  reject  according  as 


£ 


<  c 

3  >  ct3 


(3.12) 


where 


pu3 


<  c 
oe3 


‘  “3* 


(3.13) 


We  continue  this  method  until  a  decision  is  made  about  the  rank  of  ft. 

We  now  discuss  the  problem  of  testing  for  the  rank  of  the  regression  matrix. 
Consider  the  model 


Y  =  XB  +  E 


(3,14) 


where  the  rows  of  E  :  nxp  are  distributed  as  a  multivariate  normal  with  mean  vector  0  and 
covariance  matrix  Z  Also,  let  X  :  nxq  denote  the  design  matrix  and  B  :  qxp  the  regression 
matrix  We  assume  that  q  >_  p.  Tintner  (1945)  derived  the  LRT  statistic  for  the  rank  of  B 
when  Z  is  known.  Anderson  (1951)  derived  the  following  expression  for  the  LRT  statistic 
to  test  the  hypothesis  H  which  states  that  the  rank  of  B  is  r: 


L 

i 


P 

n  ( i  +£ ) 


n/2 


j=r+1 


J 


where  £>..>£  denote  the  eigenvalues  of  S  S 


and 


(3.15) 


S  =  Y'X(X'X)  XY 

i 


s  =  Y1  Q-X(X’X)-  X] Y. 


O  •  1DJ 


(3.17) 


Fujikoshi  (1977)  derived  expressions  for  the  asymptotic  distributions  of  the  test  statistics 


m  T  ,  m  T  and  m  T  where 

11  2  2  3  3 


T  =  l  log(1+£  ) 

v  j  =  r+  1 


T  =  l  Z  (3.18) 

2  ,  J 

j  =  r+1 


T  =  y  {£/(1+£)}. 
3  L  ,  )  j 

j=r+1 


Here  m  ,m  and  m  are  certain  correction  factors.  We  may  choose  m  to  be  equal  to  n. 

12  3  i 

In  deriving  the  asymptotic  distributions,  it  was  assumed  that  lim  (ft/n)  =  0(1)  where  Q.  - 

n-*.oo 

B(XX)BI  V  The  first  terms  in  the  asymptotic  distributions  of  nT  ,nT  ,nT  ,  when  the  null 
hypothesis  is  true,  are  distributed  as  chi-square  distribution  with  (p-r)(q-r)  degrees  of 
freedom  Fujikoshi  also  derived  nonnull  distributions  of  the  above  test  statistics  in  terms 
of  normal  density  and  its  derivatives  when  the  eigenvalues  of  ft  have  multiplicities. 

Recently,  Krishnaiah,  Lin  and  Wang  (1985a)  derived  the  LRT  statistics  for  testing 
hypothesis  on  the  rank  of  B  when  the  underlying  distribution  is  elliptically  symmetric;  these 
authors  have  also  investigated  the  asymptotic  distributions  of  the  above  statsitics.  A  review 
of  their  work  is  given  below. 

Let  E  be  distributed  as  elliptically  symmetric  distribution  with  density 


f(E)  = 


h(trl"1E,E) 


(3.19) 


2 


where  h(x)  is  strictly  decreasing  and  differentiable  function  of  x.  Also,  let 

A  =  CB  (3.20) 


where  C:  uxk  is  known  and  of  rank  u.  Let  H  denote  the  hypothesis  that  the  rank  of  A  is 
r  whereas  denotes  the  hypothesis  that  the  rows  of  A  lie  in  a  r-dimensional  plane  in  p- 
dimensional  space.  Now,  let  H  (a)  denote  the  set  of  nxp  matrices  of  the  form  L  = 
(GF+ab')D  where  |G'G|  A  0,  FF'  =  I  ,  D  :  pxp  is  any  positive  definite  matrix  and  b  is  any 
px  1  vector.  Then  H  denotes  the  hypothesis  that  A  e  H  (0  and  H  denotes  the 

1 r  r  -  2r 

hypothesis  that  A  e  H(1)  where  T  =  (1 . 1).  Now,  let 


m  =  ax’xf ’c 

B  =  (XX)"  1 XY 
S  (B)  =  (CB)'M~  1  (CB) 

h 

S  IB)  =  (CB)'  {M~  1  -m"  1  1(  rM~  1  1)"  1  1M“  1 }  CB 
S  =  Y’(l— X(X’X)-  1  X')Y 


(3.21) 


Let  T4  and  denote  the  LRT  statistics  for  testing  the  hypothesis  against  ,  for  some 
r'  >  r  when  £  is  known  and  unknown  respectively.  Then 


h(*  +...+6  +tr£  S) 

T  =  r+ 1  s 

4  - 

h(tr£~  1 S) 


(3.22) 


T 

5 


n  n+d) 


-n/2 


j=r+1 


) 


(3.23) 
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A  - 1 

where  s  =  min(u,p),  <$>  >  ...  >  <$>  are  the  non-zero  eigenvalues  of  S  (B)£  and  d  > 

S  h  1 


A  _  -j 

>  d  are  the  non-zero  eigenvalues  of  S^BIS 


Next,  let  T  and  T  denote  the  LRT 

6  7 


statistics  for  testing  the  hypothesis  H  against  H  for  some  r'  >  r  when  £  is  known  and 

2r  2r' 

unknown  respectively  Then 


h(iji  +...+\bi.+tr£  S) 

J  =  r+ 1  S 

6  - 

h(tr£~  1 S) 


(3.24) 


T  =  {(1  +  fc  )...( 1  +£_)} 

7  r+  1  S 


(3.25) 


A  - 1 

where  s  =  min(u-  1,p),\ji  i  >  ...  >  are  the  non-zero  eigenvalues  of  S^(B)£  and  i  > 

.  >  SL-  are  the  non-zero  eigenvalues  of  S^(B)S  \  Krishnaiah,  Lin  and  Wang  Cl985a)  also 
derived  the  LRT  statistics  analogous  to  anc*  T?  when  the  underlying  distribution  is 

complex  elliptically  symmetric.  The  above  authors  also  derived  asymptotic  joint  distributions 

of  (di . dj  and  . £-).  On  the  basis  of  the  above  results,  they  pointed  out  that  -2logT5 

and  -2logT7  are  distributed  asymptotically  as  chi-square.  When  the  underlying  distribution 

is  multivariate  normal.  Rao  (1973)  derived  T  . 

6 

In  a  number  of  situations,  it  may  not  be  realistic  to  assume  that  the  joint  distribution 
of  the  observations  Y  is  elliptically  symmetric.  It  is  more  realistic  to  assume  that  the  rows 
of  E  are  distributed  independently  and  identically  as  elliptically  symmetric.  The  two 
situations  described  above  become  identical  when  the  underlying  distribution  is  multivariate 

normal.  Krishnaiah,  Lin  and  Wang  (1985a)  have  derived  asymptotic  joint  distributions  i 

(di . d)  and  (S,  ^ . £-)  when  the  rows  of  E  are  distributed  independently  as  elliptically 

symmetric  with  mean  vector  0  and  the  same  dispersion  matrix. 


4  INFERENCE  ON  THE  RANK  OF  REGRESSION 
MATRIX  USING  MODEL  SELECTION  METHODS 

In  the  model  (3  1),  we  assume  that  the  rows  of  E  are  distributed  independently  and 


identically  with  mean  vector  0  and  covariance  matrix  E.  Also,  let  A  =  CB  be  as  defined  in 
the  preceding  section  In  the  preceding  section,  we  discussed  the  problem  of  testing  the 
hypothesis  that  the  rank  of  A  is  r  where  r  is  specified  But,  situations  arise  often  when 

the  experimenter  does  not  know  as  to  which  of  the  hypotheses  H  1-1^  ^ H1  t0  test.  In 

these  situations,  it  is  of  interest  to  select  one  of  the  models  M  ,M  M  where  M 

0  1  u  J 

denotes  the  model  that  the  rank  of  A  is  j.  We  now  give  a  review  of  the  recent  work  of 
Bai,  Krishnaiah  and  Zhao  (1986a)  for  the  determination  of  the  rank  of  A  using  model 
selection  methods.  Let 


s 

L(r)  =  (1/2)  7  d>  +  rC 

.  J  " 


(4.1) 


where  -  %(<J)  +1+  +4>  )  is  the  logarithm  of  the  LRT  statistic  for  testing  the  hypothesis  that 
the  rank  of  A  is  r  when  E  is  known  and  the  underlying  distribution  is  multivariate  normal. 
The  statistics  are  as  defined  in  the  preceding  section.  Also,  C  satisfies  the  following 

n 

conditions 

(i)  lim  {C  /log  n}  =  =o 

n 


(ii)  lim  {C  /X  }  =  0  (4,2) 

n-*.°°  n 


(iii)  lim  {X  /log  n}  =  a> 

* 

n+cc 

where  X^  denotes  the  smallest  eigenvalue  of  X  X.  Then,  according  to  the  procedure  of  Bai, 

A 

Krishnaiah  and  Zhao  ( 1 986a).  the  rank  of  A  when  E  is  known  is  estimated  with  q  where 

L(q)  =  min{L(0),L(  1) . L(s)}.  (4.3) 


The  above  authors  also  showed  that  q  defined  above  is  a  consistent  estimate  of  the  rank 


When  E  is  unknown,  let 


L  (r)  =  f  log(1+d  )  +  rC 

z  .  j  n 


(4.4) 


*  - 1 

where  d  >  ...  >  d  are  the  first  s  largest  eigenvalues  of  S  (B)S  defined  in  the 

1  s  h 

preceding  section  and  C  satisfies  the  following  conditions: 


(i)  lim  (C  / log  n)  =  oo 

n-*.oo  n 


(ii)  lim  <C  )  <  (n/3)  log  2  (4.5) 

n-^oo  n 

(iii)  lim  (C  /X  )  =  0 

_  n  * 

n^co 

We  also  make  the  following  assumptions  on  X*  (largest  eigenvalue  of  X  X)  and  X^ 

(i)  lim  (X  /log  n)  =  <*> 

* 

n+ao 

(4.6) 

« 

(ii)  X  =  Ofnlogn/loglogn) 

A 

Then.  Bai.  Knshnaiah  and  Zhao  (1986a)  proposed  using  q  as  an  estimate  of  the  rank  of  B 
where 

L  (q)  =  min{L*(0) . L  (s)}. 

A 

They  also  proved  that  q  is  a  consistent  estimate  of  the  rank  of  B. 


We  may  consider  alternative  model  selection  criteria  similar  to  those 
considered  by  Akaike  (1972),  Rissanen  (1978)  and  Schwartz  (1978)  in  some 
other  problems. 

Next  consider  the  case  when  X  is  also  stochastic  and  the  rows  of 
(Y  X)  are  distributed  independently  as  multivariate  normal  with  mean  vector 
0  and  unknown  covariance  matrix.  When  B  is  not  of  full  rank,  Izenman 
(1974)  considered  the  problem  of  estimation  of  B  and  asymptotic  distribution 
of  the  estimate  of  B.  We  can  propose  model  selection  procedures,  similar 
to  those  discussed  in  the  present  section,  to  determine  the  rank  of  B. 


5.  REDUCTION  OF  DIMENSIONALITY  UNDER  FANOVA  MODEL 


Consider  the  following  two-way  classification  model  with  one  observation  per  cell: 

x  =  y  +  a  +  B  +  ti  +  e  (5.1) 

'j  •  j  ij  'J 

for  i  =  1,2 . r,  j  =  1,2 . s,  where 

I  a  =  [B  =  Z  ti  =  £ti  =0.  (5.2) 

,=  i  '  J  ,=  ,  ,J  IJ 

Here  y.a.B  and  ti  respectively  denote  the  general  mean,  effect  due  to  i-th  row,  effect 

>  j  >i 

due  to  j-th  column  and  interaction  in  i-th  row  and  j-th  column  respectively  Without  loss 
of  generality,  we  assume  that  r  <_  s.  The  problem  of  finding  the  rank  of  the  interaction 

matrix  t)  =  (n  )  is  of  interest  and  received  attention  in  the  literature  The  usual  F  test 

u 

statistics  to  test  the  hypotheses  of  no  row  effect  and  no  column  effect  were  proposed  in 

the  literature  under  the  assumption  of  no  interactions.  If  there  is  interaction,  then  the  F 

statistics  are  no  longer  distributed  as  central  F  distributions  even  when  the  null  hypotheses 
are  true  and  so  the  usual  tests  are  no  longer  valid.  So,  it  is  of  interest  to  test  the 
hypothesis  that  the  rank  of  ti  is  zero;  this  problem  is  known  in  the  literature  as  testing  for 
additivity  Fisher  and  MacKenzie  (1923),  Tukey  (1949)  and  Williams  (1952)  are  the  early 
workers  on  the  problem  of  testing  for  additivity  when  ti  has  special  structures.  When  ti  P 
0  knowledge  of  the  rank  of  ti  will  help  to  estimate  the  parameters  more  efficiently  So, 
it  is  of  interest  to  test  for  the  rank  of  T).  We  will  now  discuss  this  problem 

Suppose  ti  is  of  rank  c  Then  it  is  known,  by  singular  value  decomposition  of  the 

matrix,  that 

TI  =  9  u  \1  +  •  +  9  V  \1  (5.3) 

1-1-1  c-c~c 

2  2 

where  9  >  >9  are  the  eigenvalues  of  tyry,  u  and  v  are  the  eigenvectors  of  nri 

1  ‘  *  c  -j  -J 

2 

and  n'T)  corresponding  to  9  Now,  let  SL  i  >  >  £  _  i  denote  the  non-zero  eigenvalues 

of  DD  where  D  =  (d  )  and  d  =  x  -  x  -  x  +  x.  Gollob  (1968)  considered  the  problem 


of  testing  the  hypotheses  0=0  and  his  tests  are  based  upon  the  assumption  that  £  s  are 

distributed  independently  as  chi-square  variables.  But  the  above  assumption  is  not  correct. 

'2 

Mandel  (1969)  proposed  heuristically  to  examine  the  magnitude  of  £  /y  a  to  test  for  0  = 

2 

0  where  y  =  E(£  )  and  6  =  (£  +  ...  +  £  )/(y  +  .. .  +  y  )  But  the  distributions 

■j  J  c+1  r-1  c+ 1  1 r- 1 

of  the  above  test  statistics  are  not  only  complicated  but  also  involve  nuisance  parame 
Corsten  and  van  Eijnsbergen  (1972)  derived  the  following  likelihood  ratio  test.  Accept  or 

reject  H  :  0  =  ...  =  0  =0  according  as 

1  c 


L  <  c  (5.4) 

1  >  1  a 


where 


P[L  <_  c  |  H]  =(  1  — ot)  (5.5) 

1  1  a  1 

and  L  =  (£  +  ...  +  £  )/(£  +  ...  +  £  ).  When  c  =  1,  the  likelihood  ratio  test  statistic 

11  cl  r-1 

was  derived  independently  by  Johnson  and  Graybiil  (1972)  Yochmowitz  and  Cornell  (1978) 

discussed  the  likelihood  ratio  test  statistic  for  testing  the  hypothesis  0=0  against  the 

alternative  0  P  0  and  0  =  ...  =  0  =0.  When  H  is  true,  it  is  known  (e  g.,  see  Johnson 

j  j+1  c 

and  Graybiil  (1973))  that  £^ . £  ^  are  jointly  distributed  as  the  joint  distribution  of  the 

eigenvalues  of  the  central  (r-l)x(r-l)  Wishart  matrix  W  with  (s—  1 )  degrees  of  freedom  and 
E(W)  =  (s- 1)1  .  Schuurmann,  Krishnaiah  and  Chattopadhyay  (1973)  derived  the  exact 
distribution  of  £  /(£  +...  +  £  )  and  £  /(£  +...  +  £  )  when  H  is  true  and  computed 

1  1  r-1  r-1  1  r-1 

some  of  the  percentage  points  of  the  above  statistic.  Krishnaiah  and 

Schuurmann  (1974)  derived  the  exact  distributions  of  £ /(£  +  ...  +  £  )  for  j  = 

j  1  C“  1 

2.3....C-1  when  H  is  true  Schuurmann,  Krishnaiah  and  Chattopadhyay  (1973)  proposed  the 
following  simultaneous  test  procedure  in  the  spirit  of  the  simultaneous  test  procedures  of 
Krishn»iah  and  Waikar  (1971a, b)  in  the  area  of  principal  component  analysis. 
Accept  or  reject  6^  =  0  according  as 


£ 

I 


£  +...  +  £  , 
1  c- 1 


<  c 
>  2n 


(5.6) 


where 
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<  c  I  H  }  =  <  1  — ot)- 

—  2a  ' 


K  +  +  £c- 


j 


(5.7) 


For  details  of  other  simultaneous  test  procedures,  the  reader  is  referred  to  Krishnaiah  and 
Yochmowitz  (1980). 

We  will  now  review  the  recent  work  of  Rao  (1985)  on  a  more  general  problem 
of  reduction  of  dimensionality. 

Let  Y  :  nxp  be  a  random  matrix  which  is  distributed  as  multivariate  normal 
with  E(Y)  =  M  and  the  covariance  matrix  of  y  is  C4'E  where  y  is  the  vector  obtained 
by  writing  the  rows  of  Y  vertically  one  below  the  other  starting  from  the  first 
and  C  is  a  known  positive  definite  matrix.  Also,  let  S  :  pxp  be  distributed 
independent  of  Y  as  central  Wishart  matrix  with  s  degrees  of  freedom  and  E(S)  =  si 
Under  the  above  model,  Rao  (1985)  derived  the  likelihood  ratio  tests  for  testing 
the  hypothesis  H  where 


h  :  m  =  +  d»w'  +  r 


(5.8) 


where  I  has  general  structure  and  has  the  structure  of  the  form 

E  -  o^v;  +  ...  +  a^VfV-  (5.9) 

2  2 

where  a^,...,a^  are  unknown  and  :  pxg^  (i  =  1,2, ...,f)  is  known  matrix  of 
rank  such  that  p  =  g^+...+g^. 

In  (5.8),  X  :  nxb  is  a  known  matrix  of  rank  b,  W  :  pxc  is  a  given  matrix 
of  rank  c,  Y  and  t>  are  matrices  of  unknown  parameters,  and  T  is  a  matrix  of 
specified  rank  r  <min(k-b, p-c) .  If  X  is  a  nxl  vector  of  unities  and  W  is  null 
matrix,  the  above  problem  reduces  to  the  problem  of  specifying  the  dimensionality 
of  row  mean  vectors  in  M  considered  by  Fisher  (1939),  Fujikoshi  (1974),  Krishnaiah 
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Lin  and  Wang  (1985a)  and  others.  If  X  is  a  nxl  vector  of  unities  and  W  is  a 
pxl  vector  of  unities,  then  H  is  the  hypothesis  specifying  the  rank  of  inter¬ 
action  in  two-way  classification  with  one  observation  per  cell  and  this  problem 

2 

was  considered  when  I  =  a  I  and  C  =  I. 


6.  RANK  OF  COVARIANCE  MATRIX  OF  RANDOM  EFFECTS 
IN  ONE-WAY  COMPONENTS  OF  COVARIANCE  MODEL 


Consider  the  one-way  components  of  covariance  model 

x  =  y  +  a  +  e  (6.1) 

-ij  -  -i  -ij 


for  i  =  1,2,...,k,  j  =  1,2 . m  where  y  is  the  general  mean  vector,  a:px1  is  the  vector  of 

i  -  ~i 

random  effects,  and  e  is  vector  of  errors,  and  x  denotes  j-th  observation  in  i-th  group. 

-•i  -'J 

Also,  oc  and  e  are  distributed  independent  of  each  other  as  multivariate  normal  with  E(cx)  = 

-i  -ij  -i 

E(e  )  =  0  and  covariance  matrices  given  by  \|»  =  E(otod)  and  E(e  e  )  =  I  .  We  also  assume 

-ij  -  -i-i  -ij-ij  i 

that  E(ad)  =  0  for  i  P  j  and  E(e  £  )  =  0  for  i  P  i’  and/or  j  P  j'.  The  covariance  matrix  of 
-i-j  -ij-'T 

x  is  given  by  I  where 
-ij  2 


Z2  =  *  +  Zy  (6.2) 

We  assume  that  \jj  is  not  of  full  rank  and  we  are  interested  in  finding  out  the  rank  of  4i 
If  the  rank  of  is  r,  then  there  exists  a  full  rank  matrix  B:(p-r)xp  such  that  Bvp  =  0.  If  the 
rank  of  is  zero,  then  we  conclude  that  there  is  no  difference  between  the  effects  of 
the  groups.  Knowledge  about  the  rank  of  \{i  will  help  to  estimate  \|j  more  efficiently. 

When  mi  =  m  =  ...  =  m  ,  the  between  groups  sums  of  squares  and  cross  products 
(SP)  matrix  and  within  group  SP  matrix  are  given  by  S  and  S  respectively  where 

b  w 


S  =  m  y  (x  -x  )(x  -x  ) 

b  L  -i.  -..  -i.  -.. 

i=  1 


s  =  y  y  <x  -x  hx  -x  i 

w  L  L  ~IJ  -I.  ~IJ  ~l. 

.=  i  j=l 


=  l  X  kmx  =  l  l 


1  j-rM 


,=  i  j=i 


Then,  S  and  S  are  distributed  independently  as  central  Wishart  matrices  with  (k-1)  and 

b  w 

k(m-1)  degrees  of  freedom  respectively,  E(S  /(k-1))  =  2  ,  and  E(S  /k(m-1»  =  2  and  I  = 

b  2  w  12 

2 1  +  rmjj.  When  the  sample  sizes  are  unequal,  is  not  distributed  as  Wishart  matrix. 
When  m  s  are  equal,  Anderson  (( 1 984),(  1 985))  has  derived  the  likelihood  ratio  test  statistic 

I 

for  testing  the  hypothesis  that  the  rank  of  i}i  is  not  greater  than  r.  Schott  and  Saw  (1984) 
derived  the  likelihood  ratio  test  for  rank  (4»  <  r  against  the  alternative  rank(\|j)  =  r+1. 

We  now  discuss  a  more  general  problem  considered  by  Rao  (1983)  and  Zhao, 
Krishnaiah  and  Bai  (1985b)  Let  Si  and  S2  be  distributed  independently  as  central  Wishart 
matrices  with  n  and  n  degrees  of  freedom  respectively  and  let  E(S/n)  =  2  for  i  =  1,2. 

12  iii 

Also,  let  22  =  T  +  2i  where  T  is  a  nonnegative  definite  matrix.  Then,  we  are  interested 
in  finding  the  rank  of  T  Rao  (1983)  proposed  a  modified  LRT  statistic  for  testing  the 
hypothesis  that  the  rank  of  T  is  a  specified  value.  We  will  now  discuss  the  model 

selection  method  proposed  by  Zhao,  Krishnaiah  and  Bai  (1985b)  for  estimating  the  rank  of 
T  Let  6  >  ...  >  6  denote  the  eigenvalues  of  S  S  'n  /n  Also,  let 

i  p  M  12  2  1 


i=  1  +mtn(qT/) 


-nf  12 

r,  _  -n/2  .  n  , 

{(a  +B  <5  )  } 

n  n  i  i 


(6.4) 


where  t  denotes  the  number  of  6  s  which  are  greater  than  one,  a  =  n  /n,  g  =  n  /n  and 

i  n  1  n  2 

n  =  n  +  n  In  addition,  let 
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L 

qt 


1 +min(tj)  -rip  /2 

n  {(a  -B  6)“n/2*'  n  } 

n  n  i  i 

i=  1  +mm(qT) 


(6.5) 


Zhao  Krishnaiah  and  Bai  (1985b)  showed  that  L  is  the  likelihood  ratio  test  statistic  for 

Q 

testing  H  against  the  alternative  that  £  and  2  are  arbitrary  and  L  is  the  likelihood  ratio 

q  12  Qt 

test  statistic  for  testing  H  against  H  (q<t)  where  H  denotes  the  hypothesis  that  the  rank  of 

q  i  j 

F  is  equal  to  ,  Now  let. 


EDCta.C  )  =  -  log  L  +  v(a,p)C 

n  an 


(6.6) 


where  v(a,p)  =  (\/^a[2p-z+  1)  and  C  satisfies  the  following  conditions 

(i)  lim  (C  /n)  =  0 

n-*.®  n 

(6.7) 

(ii)  lim  (C  /loglogn)  =  °° 

n-*.oo  n 


A 

Zhao.  Krishnaiah  and  Bai  (1985b)  estimated  the  unknown  rank  of  T  with  q  where 


EDClq.C  )  =  mm  {EDCIO.C  ) . EDC(p-1.C  )}, 

n  n  n 


(6.8) 


They  have  also  proved  that  q  is  strongly  consistent  The  above  procedure  can  be  used  to 
draw  inference  on  the  rank  of  the  covariance  matrix  of  the  vector  of  random  effects  in 
one  way  components  of  covariance  model 
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7  SELECTION  OF  ORIGINAL  VARIABLES  UNDER 
MULTIVARIATE  REGRESSION  MODEL 


In  the  area  of  univariate  regression  analysis,  it  is  of  interest  to  select  variables  which 
are  important  for  prediction.  Reviews  of  the  literature  on  some  methods  of  selection  of 
variables  are  given  in  Krishnaiah  (1982)  and  Thompson  (( 1 978a),(  1 978b)).  In  this  section,  we 
review  procedures  for  selection  of  independent  variables  which  are  important  for 
prediction  of  a  set  of  dependent  variables  under  classical  multivariate  regression  model 

Consider  the  multivariate  regression  model  (3.14)  where  X  =  [x  ,...,x  ]  and  x  nx  1  is 

-1  -q  -i 

vector  of  n  independent  observations  on  the  i-th  independent  variable  x.  Also,  let  Y  = 
[y  . y  ]  where  y  nxl  denotes  the  vector  of  n  independent  observations  on  i-th 

-i  -p  -i 

dependent  variable.  Then,  it  is  of  interest  to  find  out  as  to  which  of  the  variables  x  . x 

1  q 

2 

are  important  We  can  use  Roys  largest  root  test,  T  test  or  Krishnaiahs  finite 

max 

intersection  tests  for  the  selection  of  important  variables.  Now.  let  B'  =  ( B  -  B  )  where  B 

-1  -q  -i 

is  of  order  pxl  Also,  let  H  B  =  0  and 


2  (n-q)B!s“13 

1  "  Cii 


(7.1) 


where  B  =  )’  =  (X'X)-1 (X’Y) ,  S  =  (S  )  =  YT(I-X(X'X)  4')Y  and  e  7  is  the 

A 

covariance  matrix  of  £,.  According  to  Roy's  largest  root  test,  we  accept  or  reject 
H  according  as 


T2<  c 

i  >  a 


(7.?.) 


where  c  is  chosen  such  that 


P  [(n-q)C^S  i  S  ~  1  qcjH]  =  (1-a) 


(7.3) 


where  H  =  n  Hand  C  (A)  denotes  the  largest  eigenvalue  of  A  and  S  =  YX(X'X)  XY  = 

-•  a  ") 

h'(X*X)B.  Percentage  points  of  c  are  given  in  Krishnaiah  (1980).  Tf  we  use 
test  fe.g.,  see  Krishnaiah  (19(39)  and  Siotani  (1  959)),  we  accept  or  reject  H 


according  as 
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I 


< 

> 


c 

&1 


where 


P[T2<  c  ;  i  =  1,2 . q|H]  =  ( 1  -ot) 

i  al 


Approximate  values  of  c  can  be  obtained  from  the  results  of  Sictani 
((1959),(1960),(1961))  for  some  cases  We  conclude  that  the  independent  variable  x  is 

I 

important  or  unimportant  for  prediction  of  (y  . y  )  according  as  H  is  rejected  or  accepted 

ip  i 

We  now  discuss  Krishnaiah's  finite  intersection  tests  (Krishnaiah  (1965))  for  the  selection  of 
variables  For  an  illustration  of  the  application  of  the  finite  intersection  test,  the  reader  is 
referred  to  Schmidhammer  (1982). 

2 

Let  I  denote  the  top  kxk  left-hand  corner  of  E  =  (cs  )  and  o  =  I E  I  / 1 E  I 

k  r  ij  k+  1  1  k+  1  1  1  k  1 

for  k  =  0,1 . p-1  with  |E  I  =  1.  Also,  let  Y  =  [y  . y],  X  =  [x  . x],  and  B  = 

1  O'  J  ~l  ~j  J  -1  ~J  j 

[B  . Bl  for  j  =  1,2 . p.  In  addition,  let 

- 1  -j 


(7.6) 


for  j  =  1,2 . p-1,  ^  =  0.  We  know,  thct  the  conditional  distribution  of  y+1-  given  Y,  is 


distributed  as  multivariate  normal  with  covariance  matrix  o  I  and  the  mean  vector 

j+  1  n 


E  (y  )  =  Xn  ,  +  Y  $  =  [X.Y]  ? 

c  -j+  1  -J+  1  J-J  J  \  -j 


(7.7) 


where  ti  =8  -  B  r  with  the  understanding  that  n  =  8  Now,  let  H  c'n  =  0 

-j+i  -j+1  j-j  -i  -i  ij  -i-j 

where  c'  =  (c  . c  I  for  i  =  1,2 . q  with 

-i  ti  iq 


Cih  ~ 


0  h  /=  i 


1  h  =  i. 


Then,  the  hypothesis  H  can  be  expressed  as  H  =  H  So,  the  problem  of  testing  the 
hypotheses  H  . H  simultaneously  is  equivalent  to  testing  the  hypotheses  H  simultaneously 

1  q  ij 

Now,  let 


*  2 

(c  t)  )  (n-j-q+1) 

-i-j 

F,  .  =  - 

1J  A  2 

d  s 

ij  j 


where  d  o  is  the  variance  of  d  t| ,  ti  is  the  least  square  estimate  of  ti  under  the  model 

ij  j  -i-j  -j  -j 

2 

(7  7).  and  s  =  I S  I  / 1 S  I  where  S  is  the  top  ixj  left-hand  comer  of  S.  Then,  we 

J+ 1  1  j+ 1  1  1  j 1  j 

accept  or  reject  H  according  as 


F  <  F 

ij  >  a 


where 


P  [F  <  F  ,  i  =  1,2 . q  j  =  1,2 . p  I H] 

ij  a 


=  IIP[F  <  F  ;  i  =  1,2 . q I H] 

ij  a  1 

J=i 

=  (1-ct). 


(7.10' 


When  H  is  true  the  joint  distribution  of  F^. . is  a  multivariate  F  distribution  with  (l.n-q- 

j+1)  degrees  of  freedom  Evaluation  of  the  probability  integrals  of  the  multivariate  F 
distribution  was  discussed  in  Krishnaiah  and  Armitage  (1970)  The  hypothesis  H  is  accepted 
if  H  H  are  accepted  and  it  is  rejected  otherwise  If  H  is  rejected,  then  we  conclude 

1 1  ip  i 

that  the  independent  variable  x  is  important  for  prediction  of  the  set  (y  ,  ,y  )  of  dependent 


variables.  One  may  use  the  step-down  procedure  proposed  by  J.  Roy  (1958)  also  but  the 
lengths  of  the  confidence  intervals  associated  with  the  finite  intersection  tests  are  shorter 
than  the  lengths  of  the  corresponding  confidence  intervals  associated  with  the  step-down 
procedure.  Fujikoshi  (1985)  proposed  a  procedure,  based  on  an  information  theoretic 

criterion,  to  select  a  subset  of  variables  which  are  important  for  discrimination.  Rao  (1948) 
proposed  a  procedure  to  find  out  as  to  whether  the  addition  of  some  independent 
variables  makes  a  significant  contribution  in  prediction  of  dependent  variables 


8.  COMMENTS  ON  STEPWISE  PROCEDURES  FOR  SELECTION  OF  VARIABLES  IN 

DISCRIMINANT  ANALYSIS 


In  this  section,  we  discuss  the  stepwise  procedures  for  the  selection  of  variables  in 
the  area  of  discriminant  analysis  for  several  groups.  These  procedures  are  used  widely 
since  computer  programs  for  the  implementation  of  these  procedures  are  available  in  the 
BMD  and  SPSS  packages  Stepwise  procedures  for  the  selection  of  variables  in 
discriminant  analysis  were  proposed  in  the  literature  in  a  similar  way  as  the  corresponding 
procedures  in  the  regression  analysis  (Krishnaiahn(  1 982)).  We  will  discuss  a  stepwise 
procedure  below. 


Consider  the  following  model: 


Ely)  =  A0 

-j  -j 


(8.1) 


where 


(8.2) 


In  the  matrix  A .  :n  xk  the  elements  in  i-th  column  are  equal  to  one  and  other  elements  in 


the  matrix  are  zero.  Also,  0  =  (u  . u  ),  y'  =  (x  . x 

-j  -ij  iVj  l  j  i j i  i 


Jn. 


,x  . x  )  and  x  denotes 

k,  j  1  kjn  i  jt 

k 


observation  on  j-th  variable,  t-th  individual  and  i-th  group.  Let  H  C0  =  0  where 


I:  (k-l)xk 


(8,3) 


000  1-1 


Let  F  denote  the  usual  F  statistic  used  for  testing  the  hypothesis  H  Then 


b  (n-k) 

Ji 

F.  =  - 

'  J  w  (k-1) 

jj 


(8.4) 


where  W  =  (w  )  and  B  =  (b  )  are  the  within  group  SP  matrix  and  between  group  SP  matrix 

ij  ii 

respectively  The  likelihood  ratio  statistic  for  testing  H  is  given  by  A(x  )  where 


A  (x . )  = 


and  t  =  b  +  w  Obviously, 

J!  JI  JJ 


(1-Alx  )Mn-k) 

j 


If  max(F  F  )  <_  F  ,  we  declare  that  none  of  the  variables  are  important  for 

i  o  'a 

discrimination  and  we  don't  proceed  further  Otherwise,  we  select  the  variable 

corresponding  to  the  maximum  of  F  . F  as  the  most  important  For  example,  let  this 

1  p 

variable  be  x|  At  the  second  stage  we  test  to  find  out  as  to  whether  any  of  the 

remaining  variables  X2X3 . x  give  additional  information  for  discrimination  between  the 

populations  A  measure  of  the  degree  of  additional  information  is  provided  by 


A(x  ,x  ) 


where 


A(x1»Xj) 


w 


i  i 


w 


Ji 


1 1 


Ji 


w 


1 1 


w 


(8.8) 


ij 


In  (8  7),  Alx^.x)  is  the  likelihood  ratio  test  statistic  for  testing  the  hypothesis  that  the  mean 
vectors  of  (x  ,x  )  are  the  same  in  all  populations.  It  can  be  viewed  as  a  measure  of  the 

1  J 

discriminating  ability  of  xi  and  x  whereas  A(x  ^ )  is  a  measure  of  the  degree  of 
discrimination  of  the  variable  x  As  the  value  of  A(x  ,x  )  decreases,  the  discriminating 


i  j 


ability  of  x  and  x  increase  We  can  write  A  as 

i  i  j-1 


w 


Vi 


J'l 


J-1 


(8,9) 


where  w 


j  -i 


=  w 


www  and  t  =  t 

ji  I'  ij  j  i  jj 


t  t  t  .  Now  let, 

ji  ii  ij 


b  (n-k-1) 

J  i 


w  (k  -  1 ) 

j-' 


(8.10) 


where  b  =  t  -  w  is  the  adjusted  between  group  sum  of  squares  We  can  write 

j'  1  j  -i  j  -i 

(8  10)  as 


F. 

V 


In-k-  1 ) 


1-A 

J  1 


(k-D  A 

1 


(8.11) 


The  above  statistic  (see  Rao  M  97  3JJ  is  nothin.:  hut  the  stat  i>t  ic  used  to  test 
hvpo thes  i  s 

H  :  p  -  8  U  =  =U~BU  <8.12) 


the 
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where  8  =  cs  c  1  If  max(F  . F  )  <  F  ,  we  declare  that  none  of  the  variables 

j.  i  ji  ii  2-i  p-i  —  2d 

x^,x^ . x  are  important;  here  F^  is  the  upper  a%  point  of  the  central  F  distribution  with 

(k-1,n-k-1)  degrees  of  freedom  If  max  (F  ,...,F  )  >  F  ,  the  variable  corresponding  to 

°  2  ■  1  p  - 1  2a 

the  maximum  of  F  . F  is  declared  to  be  important.  For  simplicity  of  notation,  let  us 

2 . 1  p  .  1 

assume  that  this  variable  is,  say,  x^  After  having  selected  x^,  we  will  test  whether  the 
variable  (in  this  case  x;)  selected  at  the  first  stage  is  good  for  discrimination  in  presence 
of  the  variable  x  ;  this  is  the  third  step  This  can  be  tested  by  using  the  following  test 
statistic 


b1  2^n_^~  ^  ^ 

F  =— -  (8.13) 

w  (k-1) 

1-2 


where  t  =t  -  t  t  t  w  =  w  -www  and  b 

1.2  11  12222’  1-2  11  122221  1-2 

decide  to  retain  or  exclude  x  ^  from  the  selected  subset  according  as 


t 

1-2 


w  We 
1  2 


F  >  F  , 
1 . 2  <  2a 


(8. 1  A) 


Here  we  note  that 


F 


1  2 


(n-k-l)(1-Ai.2) 


(k-1) 


1-  2 


(8.15) 


where 


w 


i-  2 


1-2 


i  ■  2 


(8.16) 


-  i  -  i 

w  =  w  -www.  and  t  =  t  -  t  t  t  If  A**  -  1/A  .  then, 

12  ’1  122221  1-2  11  122221  1-2  12 


■  ,n_k  ’’(A*  -1) 

F,.-,  = - !-2 

1  “  (k-1) 


(8.17) 


In  the  fourth  step,  we  either  select  one  of  the  variables  x  . x  or  decide  not  to  select 

3  p 

any  more  on  the  basis  of  the  discriminating  ability  of  these  variables  individually  in  presence 

of  x  and  x^  If  we  discard  x  ^  at  the  third  step,  then  we  consider  the  discriminating 

ability  of  the  variables  x  . x  in  presence  of  x  only  This  procedure  fs  continued  until  a 

3  p  2 

decision  is  made  not  to  select  any  more  variables  or  all  the  variables  are  selected 

Suppose,  after  a  few  stages,  we  selected  x  ,x  x  and  x  is  the  latest  addition  to  the 

3  4  j  j 

selected  subset  Then,  we  test  whether  x  ,x  . x  are  individually  important  in  presence 

3  4  J-  1 

of  the  remaining  variables.  For  example,  we  test  whether  x^  is  important  in  presence  of 

the  variables  x  ,x  ,x  x.  The  statistic  used  to  test  whether  x  (i  =  3,4 j—  1 )  is  important 

3  5  6  j  r 

is  given  by 


bi-(3,4 . j)  (n-k-j+3) 


i-(3,4,„.o  ,j)  w  (k-1) 

i  '(3,4 . j) 


(8.18 


with  the  understanding  that  the  suffix  i  does  not  occur  in  the  set  (3,4 . j).  Let 


'i-  (3,4,... ,j) 


A(x  x  ,...,x . x  ) 

3  4  ,  j 


A(x  ,x  . x  ,x  . x  ) 

3  4  1-1  1+1  j 


(8.19 


where  A(x  ,x  . x . x  )  is  the  ratio  of  the  determinant  of  the  within  group  SP  matrix  based 

3  4  ,  J  v  K 

upon  the  variables  x  ,x  . x . x  and  the  determinant  of  the  total  SP  matrix  based  upon  the 

3  4  I  p 

same  variables  Similarly  A(x  ,x  . x  ,x  . x  )  can  be  defined  So, 

3  4  1-1  1+1  j 


w 

I  (3.4 . j) 

i-  (3,4, . . .  ,  j  )  j 

i  <3,4 . j) 


(8.20 


Hence 


Fi • (3,4 


(n-k-j+3)  * 1  -(3.4 . j)* 

(k- 1)  A 

I 


(8.21 


•(3,4 . i) 
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The  variable  x  is  retained  or  excluded  according  as  F  ^  is  greater  than  or  less  than 

upper  100a%  point  of  the  central  F  distribution  with  (k-  1,n-k-j+3)  degrees  of  freedom 

At  any  stage,  we  can  test  whether  all  the  variables  selected  together  will  discriminate 

between  the  groups  by  using  many  standard  procedures  For  example,  suppose  x  ,x  . x 

12  q 

are  selected  Then,  we  compute  B  ^  and  W^  which  are  respectively  between  group  SP 
matrix  and  within  group  SP  matrix  based  on  x^.x  They  are  given  by 


b  b  ...  b 

11  12  1  q 


W  W  ...  W 
11  12  1  q 


B  =  b  b  ...  b  ,  W  =  w  w  ...  w 
11  21  22  2q  11  21  22  2q 


(8.22) 


b  b  b 
ql  q2  qq 


w  w  w 

ql  q2  qq 


We  can  test  whether  the  variables  x  . x  together  will  discriminate  between  the 

1  q 

populations  by  computing  various  functions  of  the  eigenvalues  of  B^W^1  Some  of  these 

functions  are  C  (B  W  \  tr(B  W  \  tr(B  (B  +W  )  S  and  IB  (B  +W  )  1  I  One 
Lii  ii  1111  1111  ii  1  i  1  1  i  ii  1 

can  also  use  finite  intersection  tests. 

We  now  have  a  critical  look  at  the  stepwise  procedure  for  the  selection  of 
variables  At  the  first  stage  of  the  procedure,  we  choose  the  critical  value  F^  such  that 

p[F  <  | H  ]  =  ( 1  -a)  .  (8.23) 

I  V*  J 


Here,  the  hypotheses  H  ,..,H  are  tested  individually  Since  the  decision  not  to  select  or 

1  D 

select  any  variable  at  the  first  stage  is  based  upon  whether  or  not  all  the  hypotheses  are 
accepted  simultaneously,  it  would  be  a  natural  thing  to  test  them  simultaneously  and  choose 

the  critical  value  F  such  that 

la 


P [F  <_F  ;  j  =  1,2 . p|  f\H]  =  (1-od 

j  la  j 

j=i 


(8.24; 


The  joint  distribution  of  F  ^ F  is  not  only  complicated  but  also  involves  nuisance 

parameters  But,  we  can  use  Bonferroni's  inequality  to  compute  an  upper  bound  on  F  . 

At  the  first  stage,  we  select  one  variable  only  as  the  most  important  and  no  decision  is 

made  about  other  variables.  But,  this  "most  important  variable"  may  be  discarded  at  a  later 

stage  So,  there  is  some  inconsistency  in  this  method  and  we  will  discuss  this  point  later 

At  the  second  stage,  the  critical  value  F  is  chosen  such  that 
®  2a 


P [F  <  F  IH  ,]  =  (1-o). 
j  .1  2a  j  -1 


(8.25; 


We  go  to  the  second  stage  if  and  only  if  maxfF^ . F  )  >  F^.  So,  at  the  second  stage, 

we  have  to  compute  the  following  conditional  probabilities  instead  of  (8.23)  even  if  we  are 
testing  the  hypotheses  H  individually: 


P[F  <  F  lmax(F  . F  )  >  F  J. 

j  •  1“  2Qt'  l  p  —  1  or 


(8.26; 


It  is  quite  complicated  to  compute  the  above  probabilities.  Apart  from  it,  we  have  to  test 

i H  ^  simultaneously  instead  of  testing  them  individually  At  the  second  stage,  we 

select  the  variable  (say  x^)  corresponding  to  rnaxlF^  ^ . F  J.  The  statistic  F  ^  for  any 

given  j  is  useful  for  testing  whether  the  variable  x  gives  additional  information  for 

discrimination  between  the  groups  in  presence  of  the  important  variable  x  .  But,  the 

variable  x  which  is  declared  to  be  the  most  important  at  the  first  stage  may  be  discarded 
as  being  unimportant  at  a  later  stage  and  so  the  procedure  may  not  be  meaningful  Apart 
from  it.  the  choice  of  the  critical  values  is  very  arbitrary  and  we  cannot  say  what  the  Type 
I  error  of  this  procedure  is.  In  view  of  the  points  raised  above,  we  do  not  recommend 
the  use  of  the  above  stepwise  procedures  Krishnaiah  (1982)  discussed  the  disadvantages 
of  using  forward  selection  and  backward  selection  procedures  for  selection  of  variables 
under  univariate  regression  models  Similar  criticism  applies  for  forward  selection  and 

backward  selection  procedures  for  selection  of  variables  in  discriminant  analysis. 
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9.  TESTS  FOR  THE  RANK  OF  THE  CANONICAL  CORRELATION  MATRIX 


It  is  known  that  multiple  correlation  coefficient  is  the  maximum  correlation  between 

a  variable  and  linear  combinations  of  a  set  of  variables  Hotelling  ((19351.(1936))  generalized 

the  above  concept  to  two  sets  of  variables  x'i:1xpi  and  x'  :1xp^  and  introduced  canonical 

correlation  analysis.  Canonical  correlation  analysis  is  useful  in  studying  the  relationship 

between  the  two  sets  of  variables.  Let  the  covariance  matrix  of  x  =  (x  ,x'  )  be  £  where 

-1  -2 


1  Z 

1  11  12 

,  z  z 

v  21  22 


(9.1) 


and  £  p  xp  is  the  covariance  matrix  of  x  Then  £  'e  £  'z  is  known  to  be  the 

ii  i  i  -i  1  1  1 2  22  2  1 

2 

canonical  correlation  matrix  Without  loss  of  generality,  we  assume  that  P  l  £  P  p  i  > 

2  - 1  -  1 

>  p  denote  the  eigenvalues  of  E  E  Z  E  Here,  p  ,...,p  are  known  as  canonical 

Pi  1  1  12  22  21  1  p 

2 

correlations  where  p  is  the  positive  square  root  of  p  Now  let  a  and  g  denote  the 


- 1_  1 


-  i_  1 


eigenvectors  of  £  £  £  £  and  Z  Z  Z  £  respectively  corresponding  to  p  Then 

11  12  22  21  22  21  11  12  i 

a  x  o i  x  and  g  x  ,...,g  x  are  known  as  canonical  variables  One  of  the  important 

-l-'  -p  ^ - 1  -1-2  -p  -2 

problems  m  the  area  of  canonical  correlation  analysis  is  to  find  out  the  number  of 
canonical  correlations  which  are  significantly  different  from  zero.  In  this  section,  we 
discuss  some  procedures  for  testing  the  hypothesis  on  the  rank  of  the  canonical 
correlation  matrix  when  the  underlying  distribution  is  multivariate  normal 


Let  X.nxp  be  a  random  matrix  such  that  E(X)  =  0  and  E(X’X)  =  n£  Also  let. 


S  R  ' 
S  =  X’X  =  ;  11  12  ! 


Is  si 

21  22/ 


(9.2) 


2  2 

where  S  is  of  order  p  xp  In  addition,  let  r  >  ...  >  r  denote  the  eigenvalues  of 

j  1  j  i  '  p 


S  S  S„,S  Then  r  . r  are  known  as  the  sample  canonical  correlations  where  r  is 

11122221  Ip  1 

2  2  2 

the  positive  square  root  of  r .  Various  functions  of  r  ,...,r  were  proposed  in  the 

'  1  pi 

literature  as  test  statistics  for  determination  of  the  rank  of  the  canonical  correlation  matrix. 
We  will  review  these  procedures  in  this  section. 


We  first  assume  that  the  rows  of  X  are  distributed  independently  as  multivariate 
normal  In  this  case.  Bartlett  (1947)  proposed  a  procedure  for  testing  the  hypothesis  H 
2  2 

where  H  denotes  p  =  ...  =  p  =  0.  he  also  derived  asymptotic  distribution  of  the 

above  statistic  Fujikoshi  (1974)  showed  that  the  above  test  statistic  is  the  LRT  statistic 
Hsu  (1941)  derived  asymptotic  joint  distribution  of  the  sample  canonical  correlations  when 
H  is  true  When  the  population  canonical  correlations  p  . p  have  multiplicities  and  none 

‘  1  pi 

of  them  is  equal  to  zero.  Fujikoshi  (1978)  derived  the  nonnull  distribution  of  a  single 
function  of  the  sample  canonical  correlations  whereas  Krishnaiah  and  Lee  (1979)  derived 
asymptotic  joint  distribution  of  functions  of  the  sample  canonical  correlations  The 
expressions  derived  by  Krishnaiah  and  Lee  involve  multivariate  normal  density  and 
multivariate  Hermite  polynomials  When  the  underlying  distribution  is  not  multivariate  normal. 
Fang  and  Krishnaiah  (1982)  obtained  results  analogous  to  those  obtained  in  the  above  paper 
of  Krishnaiah  and  Lee 


Now,  let  the  joint  distribution  of  the  elements  of  X  be  elliptically  symmetric  with 


density  given  by 


f(X)  =  |  Z  |  "n/2h(trZ  1 X  X) 


(9.31 


Then.  Krishnaiah.  Lin  and  Wang  (1985)  showed  that  the  LRT  statistic  for  testing  the 
hypothesis  p  i  =  .  =  p  =  0  is  given  by 


L(k)  =  n  (1-r 
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So,  the  LRT  statistic  is  the  same  as  when  the  underlying  distribution  is  multivariate  normal 

2  2 

Thev  also  noted  that  the  distribution  of  any  function  of  r  . r  is  independent  of  the  form 

1  P1 

of  the  underlying  distribution  as  long  as  the  underlying  distribution  belongs  to  the  family  of 
elliptical  distributions 

We  will  now  review  some  of  the  work  reported  in  the  literature  on  canonical 
correlation  analysis  when  it  is  assumed  that  the  observations  are  distributed  independently 
and  identically  as  elliptically  symmetric  with  the  following  common  density 

fix)  =  jZ|‘1/2h(xT_1x).  (9.5) 


Now,  let 


2  2 
2p  ( 1  -p  ) 

I  I 


Then  Murihead  and  Waternaux  (1980)  showed  that  c  . c  are  asymptotically  distributed 

1  P, 


independently  as  normal  with  mean  0  and  variance  (<+1)  when  p  ^ . p^  are  distinct.  This  is 

a  special  case  of  a  result  of  Fang  and  Krishnaiah  (1982)  Krishnaiah,  Lin  and  Wang  (1985b) 
derived  asymptotic  joint  distribution  of  the  sample  canonical  correlations  when  the 
population  canonical  correlations  have  multiplicities  and  the  last  few  population  canonical 
correlations  are  zero.  In  particular,  they  showed  that  the  joint  asymptotic  distribution  of 
Knr  /«:+ 1) . (nr  /k+D)  when  H  p  =  =  p  =  0,  is  the  same  as  the  joint  distribution 

s+  1  p  1  S  S+  1  P  1 

of  the  eigenvalues  of  the  central  Wishart  matrix  W  with  (p ^ — s)  degrees  of  freedom 

p  1  s  ' 

and  E(W  )  =  (p  — s)l  This  result  is  useful  in  implementation  of  certain  test 

p  -s  2  p  -s 

l  I 

2 

procedures  for  H  when  the  sample  size  is  large  For  example,  we  can  use  r  or 
s  s+  1 

2  2 

ir  +  .  *r  i  as  a  test  statistic  for  H 
s+  ■  P  ,  S 


We  now  discuss  the  problem  of  testing  for  the  rank  of  the  canonical  correlation 


matrix  under  correlated  multivariate  regression  equations  (CMRE)  model  considered  by 
Kariya.  Fujikoshi  and  Krishnaiah  (1984).  Consider  the  CMRE  model 


V  =  X  0  +  E  (9.7) 


for  i  =  1,2  In  the  above  model,  the  rows  of  (E^.E^)  are  distributed  independently  as 


multivariate  normal  with  mean  vector  0  and  covariance  matrix  £  where 


I  = 


1 1 


S12' 


2  1 


22 


(9.8) 


and  £  is  of  order  p  x  p  Also,  X  nxr  is  the  design  matrix  and  0  rxp  is  the  matrix 

ij  i  j  i  i  H  i  ' 


of  unknown  parameters  for  i  =  1,2  Without  loss  of  generality,  we  assume  that  p^  < 


Now,  let 


S  = 


1  i 


1  2 


2  1 


22 


(9.9) 


where  S  =  Y  Q  Q  Y  and  Q  =  I  -  XIX'XI  V  Also,  let  R  =  S~S  S  ’s  Kariya, 

ij  i  i  j  j  i  n  ii.  i  11  12  22  2  1 


Fujikoshi  and  Krishnaiah  (1984)  investigated  the  problem  of  testing  the  hypothesis  that  p" 


p  =0  They  also  derived  the  asymptotic  distributions  of  three  statistics  in  the  null 
p  . 


case  and  under  local  alternatives  We  can  test  the  hypothesis  that  p 


p  =  0  by 
p 


i 


2  2  2  2  2  2  2 

considering  suitable  functions  of  r . r  like  r ,  r  +  ..  +  r  etc.,  where  r  >  ...  >  r  are 

t  p,  t  !  p  i  P, 

i  I  l 


i 


the  eigenvalues  of  the  sample  canonical  correlation  matrix  S  S  S  S 

11122221 


For  an  application  of  the  canonical  correlation  analvsis  in  econometrics, 
the  reader  is  referred  to  Hannan  (1967)  and  dhow  and  Ray-Chowdhur i  (1967). 


10  MODEL  SELECTION  METHODS  FOR  DETERMINATION  OF 
THE  RANK  OF  THE  CANONICAL  CORRELATION  MATRIX 


Let  X'  =  [x  ...  .x  ]  pxn  be  a  random  matrix  whose  columns  are  distributed 


3 


independently  and  identically  as  multivariate  normal  with  common  mean  vector  0  and 
covariance  matrix  Z  Let  x  and  Z  be  partitioned  as  x  =  (x^.x  I  and 


1  2 


(10.1) 


22 


2  2 

where  Z  is  of  order  p  xp  and  x  is  of  order  pxl  Let  p  >  ...  >_  p  denote  the  first 

Ij  I  i  -IJ  J  1  s 
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largest  s  eigenvalues  of  the  population  canonical  correlation  matrix  Z  Z  Z  Z  where  s 


11  1 2  22  2  1 


=  minip^p  ) 


2  2 

Also,  let  r  ^  >  ...  >  r  denote  the  first  largest  s  eigenvalues  of 


S  's  S  S  where 

H  122221 
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(10.2) 


and  S  is  of  order  p  xp  Let  M  (k  =  0 .  1,2,...,s)  denote  the  model  for  which  rank(Z  )  =  k. 

'j  1  j  k  i2 

that  is  the  number  of  nonzero  canonical  correlations  is  equal  to  k.  Also,  let  denote  the 
hypothesis  that  rank  (Z  )  =  k.  Let  Llk!  denote  the  likelihood  ratio  test  statistic  for  H 

12  k 

Then 


log  Llkl  =  (n/2) 


l  log!  1  -  r  ^ 

i  =  k  +  1 


( 10.3) 


Mow.  let 


C  (kl  =  -  log  L(k)  +  kC  (10.4) 

n 


where  C  satisfies  the  following  conditions 

P 


(i)  lim  {C  /n}  =  0 

n 

n^ao 


(n)  lim  {C  /loglogn)  >  p  p 


(10.  Vi 
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Let  q  denote  the  true  rank  of  S  Z  'z  E  Then,  Bai.  Krishnaiah  and  Zhao  (1986a) 

12  22  21  1  1 

4  \ 

proposed  to  use  q  as  an  estimate  of  q  where  q  is  given  by 


G(q)  =  min{G(0) . G(s)}. 


(10.6) 


The  above  authors  also  showed  that  q  is  a  strongly  consistent  estimate  of  q.  Now, 
assume  that  the  assumption  of  normality  is  violated  but  x  are  i.i.d.  vectors  with  E(x  )  = 

- 1  ~n  - 1 

2 

0,  E(x ^ x ^ )  =  Z  and  Efx^x  )  <  When  the  assumption  of  normality  is  violated,  L(k)  need 

not  be  the  LRT  statistic  for  H  but  we  can  still  use  it  in  (10.4).  Then  Bai,  Krishnaiah  and 

k 

A 

Zhao  (1986)  showed  that  q  in  (10.6)  is  still  a  strongly  consistent  estimate  of  q  under 
certain  conditions. 


11.  SELECTION  OF  ORIGINAL  VARIABLES  IN 
CANONICAL  CORRELATION  ANALYSIS 


Let  us  consider  the  set  (x'  ,x'  )  of  p  +  p  variables.  We  wish  to  select  a  set  of  r 

- 1  -2  *1  2  2 

important  variables  from  the  x^  set  on  the  basis  of  the  degree  of  dependence  with  x^  set. 


There  are  I  sets  Let  these  sets  be  denoted  by  x  and  let  the  sample  canonical 

\ r 2  /  '  ~'i 

V  ' 

correlation  matrix  between  x  set  and  x  set  be  denoted  by  S  S  S  S  We  use  the 

-i  -t  1 1  if  f  f  t  i 
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largest  root  of  the  canonical  correlation  matrix  as  a  criterion  to  select  the  variables.  We 
declare  that  none  of  these  sets  are  important  if 


max  c  (S  S  S  S  )  <  c 

i  1  i  if  f  f  f  1  -  a 

i  i  i  i  i 


where  c  (A)  denotes  the  largest  eigenvalue  of  A  If 


max  c  (S  S  S  S  )  >  c 

L  1  1  If  f  f  f  1  'X 


1 


then  the  set  corresponding  to  max  c  (S  S  S  S  )  is  declared  to  be  the  most  important 

^  3  l  1 1  If  f  f  f  1 


The  critical  value  c  is  chosen  such  that 


-i„  -i 


P[max  c(S  S  S  S  )<c|H]=  (1-a) 
.  l  ii  if  f  f  f  i  a 1 


and  HZ  =  0.  In  other  words,  the  critical  value  c  is  chosen  such  that  the  probability 

12  a 

K\ 

of  declaring  that  none  of  the  sets  are  important  when  in  fact  none  of  the  variables  in 


the  x  set  are  correlated  with  x  set.  But  the  distribution  of  max  c  (S  S  S  's  )  is 

-2  - 1  i  L  11  1 f  f  f  f  1 
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very  complicated  to  derive.  So,  we  use  the  following  bound  to  get  an  approximate  value 


P[c  (S_1S  S_1S  )  <  c  I H] 

L  11  12  22  2 1  a1 


<  P[max  c  (S  ’s  S  S  )  <  c  I H]  =  ( 1  —  ot) 
i  l  ii  if  t  f  f  i  —  a1 


We  will  now  discuss  an  alternative  procedure  for  the  selection  of  the  best  subset 

P2 

of  q  variables  from  the  x  set  and  let  x  (i  =  1,2 . (  ))  denote  a  subset  of  q  variables 

~2  -f  q 

I 

from  the  p  variables  x  As  before,  let  E  E  E  Z  denote  the  canonical  correlation 

r 2  -2  1 1  1  f  f  f  f  1 
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matrix  connected  with  x  set  and  x  set  Let  \L  denote  a  suitable  function  of  the 

-i  -f  i 


eigenvalues  of  the  above  matrix  Also,  let  denote  the  corresponding  function  of  the 

-  i  - 1 

eigenvalues  of  S  S  S  S  In  addition,  let  ,.  ,4i  be  ordered  as  \Lr  ,  >  vbr  -i  >  i 

3  1  1  if  f  f  f  1  1  p  [ i ]  [2] 

I  ill  0 


4i  r  where  p  ='  I.  Then,  the  subset  associated  with  the  maximum  of  is 

Ld  J  0  q  !  i  p 

0  u 

A  A 

declared  to  be  the  best  subset  Suppose  ijj  is  the  largest  of  s  In  this  case,  the 
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probability  of  correct  decision  is  given  by  the  probability  of  41  being  greater  than  \ji  (j  = 

•  j 

1. 


l.i-  1 . p^)  when  is  greater  than  41  for  j  P  1.  This  probability  involves  nuisance 

parameters  One  may  use  bounds  which  are  free  from  nuisance  parameters 

We  now  discuss  the  problem  of  studying  the  effect  of  additional-  variables  on  the 


canonical  correlations.  Consider  two  sets  of  variables  x 

- 1 


p  x  1  and  y 

1  - 1 


x  1 


Without  loss  of  generality,  we  assume  that  p^  <  q  ^  Suppose  the  sets  of  variables  x^ 
and  y^  are  augmented  to  x  :  pxl  and  y  :  qxl  by  adding  extra  sets  of  variables  x^  p^xl 
and  y^  q^x  1  respectively  Also,  we  assume  that  (x,y)  is  distributed  as  multivariate  normal 
with  mean  vector  y  and  covariance  matrix  2  where 


and  £  is  the  covariance  matrix  of  x.  Let  p  > 

XX  -  1 


>  p  denote  the  canonical 
p . 


correlations  between  the  sets  xi  and  yi  and  let  ^  >  ...  >  denote  the  canonical 

correlations  between  x  and  y.  Also,  let  6  =  p  -  p  for  a  =  1,2 p  .  Then  6  >  0 

-  -  a  a  a  1  a 

Next,  let 


s  = 


s  s 

xx  xy 

s  s 

yx  yy , 


denote  the  sample  covariance  matrix  based  on  (n+1)  observations  on  (x‘,y‘)  and  the  sample 
canonical  correlations  r  ^  <  ...  <  r  are  the  positive  square  roots  of  the  eigenvalues  of 
S  S  S  k  .  Similarly,  let  r  >  ...  >  r  denote  the  sample  canonical  correlations  based 

xx  xy  yy  yx  1  —  p 

on  (n+1)  observations  on  (x'.y'l. 


L---:- 
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Now,  let  f (d  . d  )  be  a  continuously  differentiable  function  in  a  neighborhood  of  d 

1  p  i 

=  6  where  d  =  Id  . d  )  and  6  =  (6  . 6  ).  Then,  Fujikoshi,  Krishnaiah  and  Schmidhammer 
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(1985)  showed  that  /n{f(d  . d  )  -  f(6  . <5  )}  is  distributed  asymptotically  as  normal  with 


r  •/ 
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mean  zero  and  certain  variance  c  When  fid  . d  )  =  d  and  p  =  p  or  q  =  q  the 

t  p  i  i 

i 

above  result  was  derived  by  Wijsman  (19861.  The  result  of  Fujikoshi  Krishnaiah  ana 

Schmidhammer  (1985)  can  be  used  to  find  out  whether  the  addition  of  new  variables  to 

one  or  both  of  the  sets  x  and  y  will  have  effect  on  functions  of  the  canonical 

- 1  ~  i 

correlations.  For  example,  we  can  draw  inference  as  to  whether  the  addition  of  variables 
will  increase  the  values  of  the  largest  canonical  correlation,  sum  of  the  canonical 

correlations,  etc.  If  there  is  no  significant  increase,  we  will  conclude  that  the  new  variables 
are  not  important  in  explaining  the  association  between  the  two  sets  of  variables  Fujikoshi 
(1985)  proposed  a  procedure  based  on  an  information  theoretic  criterion  to  select  best 
variables  in  canonical  correlation  analysis 


12  REDUCTION  OF  DIMENSIONALITY  AND  THE  STRUCTURE  OF 
DEPENDENCE  IN  TWO-WAY  CONTINGENCY  TABLE 

Consider  two-way  contingency  table  and  let  p  (i  =  1,2,  „r+1,  j  =  1,2, .  ..s+1)  denote 
the  probability  of  an  observatin  falling  in  i-th  row  and  j-th  column  We  will  consider  the 
model 


P  =  P  P  C 

ij  i.  •]  ij 


(12.1) 


where  p=p  +  p  .  p  =  p  + 

I.  t  1  I.S+  1  »j  1  j 


p  and  r  is  an  unknown  constant  In  a 

r+i.j  M 


number  of  situations,  we  are  interested  in  studying  the  structure  of  dependence  between 


rows  and  columns  if  p  £  p  p  If  we  know  the  structure  of  dependence,  we  can  estimate 

ij  i.  j 


the  unknown  parameters  more  efficiently  Now,  let  F  =  (f  )  where  f  =  p  //p  p  From 

1  j  >i  u  '  j 


the  singular  value  decomposition  of  the  matrix,  it  is  known  (eg,  see  Lancaster  (1969))  that 


F  =  ft  <  6n  +  I  6  Tf' 

~0  ~u  0  u  ~u  -u 


(12.2) 


u=  1 


where  <$  >  ...  >  <5  are  the  eigenvalues  of  F.  £;*  is  the  eigenvector  of  FF'  corresponding 

fir  -«u 


2  < 
to  6  and  n*  is  the  eigenvector  of  F'F  corresponding  to  6 


=  1.  e 


Here  6 


'»  p_  /p  _  and  r£  =  i/p  »  p  We  will  now  review  the  work  of  0  Neill 

'1978a1  1978b1  1980)i  and  Bhaskara  Rao  Krishnaiah  and  Subramanyam  (1985)for  testing  for 
the  ranK  of  the  matrix  z,  We  also  review  the  work  of  Bai.  Zhao  and  Krishnaiah  (1986)  for 
determination  of  the  rank  of  z,  by  using  model  selection  methods  Without  loss  of 
generality  we  assume  that  r  <  s  in  the  sequel 

Let  n  denote  the  frequency  in  i-th  row  and  j-th  column,  n  =  n  +  ...  +  n  and 

l  l  1  I.S+1 
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n  =  n  -  -  n  Also,  let  B  =  (b  )  where  b  =  n  //n  n  Mow.  let  6  >.  >  6 

1;  IJ  >J  IJ  I-  J  0  r 

'2 

denote  the  eigenvalues  of  BB  where  =  T  We  assume  that  n  is  fixed  and  the  joint 
distribution  of  the  cell  frequencies  is  given  by 


,  n.  . 

n!  IT  - -  p. 

.  .  n  ! 


(12.3) 


The  classical  test  statistic  for  testing  the  hypothesis  p  =  p  p.  of  independence  is  given  by 


r+  1  S+  1 

y2  =  V  Tin  -(nn  /n))2/n  n 
A0  “  ,  ij  i.  1  i.  *j 

i=l  j=i 


(12.4) 


When  the  null  hypothesis  is  true,  xQ  >s  distributed  asymptotically  as  chi-square  with  rs 

2 

degrees  of  freedom  The  above  hypothesis  is  equivalent  to  testing  the  hypothesis  that  p  ^ 

2  2  2 

=  ...  =  p  =0  and  it  can  be  used  by  using  p  +  ...  +  p  as  a  test  statistic.  This  test  is 

r  1  r 

2  2  -2 

equivalent  to  the  chi-square  test  for  independence  since  xo  =  n< p  1  +  ■■  +  p)  Now,  let  H 

2 

denote  the  hypothesis  that  p^  =  0  This  hypothesis  is  equivalent  to  the  hypothesis  that  the 

rank  of  Z,  is  t  O'Neil  (( 1 978a).(  1 978b))  showed  that  the  joint  asymptotic  distribution  of 

2 

npi . np  ,  when  H  is  true,  is  the  same  as  the  joint  distribution  of  the  eigenvalues  of  the 

central  Wishart  matrix  W  with  s  degrees  of  freedom  and  E(W)  =  s  T  Tables  for 

>r 

percentage  points  of  the  largest  eigenvalue  of  the  central  Wishart  matrix  are  given  in 
Krishnaiah  (1980) 
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We  will  now  review  the  work  of  Bhaskara  Rao  Krishnaiah  and  Subramanyam  (1985) 

2  ^2 

for  determination  of  the  rank  of  z,  They  suggested  functions  of  . p  as  test  statistics 

2  2  2 

for  testing  For  example,  one  may  use  p^,  ^  +  .  +  p  as  test  statistics  The  above 
authors  also  suggested  the  following  simultaneous  test  procedure  We  accept  or  reject  H 
according  as 


r.2 

6  <  c 

I  >  c 


(12.5) 


where 


P["p^  <  c  | H  ]  =  ( 1  — ct)  (12.6) 

1  a1  1 

If  H  is  accepted  and  H  ^  is  rejected,  then  the  rank  of  £  is  t  Bhaskara  Rao.  Krishnaiah 

2  2 

and  Subramanyam  (1985)  derived  asymptotic  joint  distribution  of  functions  of  p  when 

2  2  s\2  ^2 

p  ,  ,p  have  multiplicities.  O'Neil  (1978a)  suggested  using  n(p(  +  ...  +  p  )  as  a  test  statistic 

for  testing  the  hypothesis  that  the  rank  of  £  is  t  In  general,  we  can  use  a  suitable 

2  2  2 

function  of  p . p  like  the  above  test  statistic  or  np  to  test  the  hypothesis  that  the  rank 

t  r  t 

of  p  is  t  But,  unfortunately,  the  distributions  of  the  above  test  statistics  involve  nuisance 
parameters  even  asymptotically  As  an  ad  hoc  procedure,  one  can  replace  the  nuisance 
parameters  with  their  consistent  estimates 

Bai,  Krishnaiah  and  Zhao  (1986b)  proposed  the  following  procedure  for  determination 
of  the  rank  of  P  =  (p  )  Let 

'i 


G(k)  =  n 


r 

l 


j  =  k  +  1 


a2 

P 

1 


+  kC 

n 


(12.7) 


where  C  satisfies  the  following  conditions: 


(i)  lim  (C  / n)  =  0 


(ii)  lim  (C  /loglogn)  =  oo. 

_  _  n 

n^-oo 


(12.8) 


Then,  the  unknown  rank  q  of  P  is  estimated  with  q  where  'q'  is  given  by 


G(Q  =  min{G(1) . G(r)}. 

Bai,  Krishnaiah  and  Zhao  (1986b)  shewed  that  cf  is  a  consistent  estimate  of  q. 


Am.  A*.  *7 mJ*.  A_'.  *J“  “-'a.  .  1-tV  -"V 
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